Ambiguity Resolution for Machine Translation of Telegraphic Messages

نویسندگان

Young-Suk Lee

Clifford J. Weinstein

Stephanie Seneff

Dinesh Tummala

چکیده

Telegraphic messages with numerous instances of omission pose a new challenge to parsing in that a sentence with omission causes a higher degree of ambi6uity than a sentence without omission. Misparsing reduced by omissions has a far-reaching consequence in machine translation. Namely, a misparse of the input often leads to a translation into the target language which has incoherent meaning in the given context. This is more frequently the case if the structures of the source and target languages are quite different, as in English and Korean. Thus, the question of how we parse telegraphic messages accurately and efficiently becomes a critical issue in machine translation. In this paper we describe a technical solution for the issue, and reSent the performance evaluation of a machine transtion system on telegraphic messages before and after adopting the proposed solution. The solution lies in a grammar design in which lexicalized grammar rules defined in terms of semantic categories and syntactic rules defined in terms of part-of-speech are utilized toether. The proposed grammar achieves a higher parsg coverage without increasing the amount of ambiguity/misparsing when compared with a purely lexicalized semantic grammar, and achieves a lower degree of. ambiguity/misparses without, decreasing the parsmg coverage when compared with a purely syntactic grammar. 1 In t roduct ion Achieving the goal of producing high quality machine translation output is hindered by lexica] and syntactic ambiguity of the input sentences. Lexical ambiguity may be greatly reduced by limiting the domain to be translated. However, the same is not generally true for syntactic ambiguity. In particular, telegraphic messages, such as military operations reports, pose a new challenge to parsing in that frequently occurring ellipses in the corpus induce a h{gher degree of syntactic ambiguity than for text written in "~rammatical" English. Misparsing triggered by the ambiguity ot the input sentence often leads to a mistranslation in a machine translation system. Therefore, the issue becomes how to parse tele.graphic messages accurately and efficiently to produce high quahty translation output. In general the syntactic ambiguity of an input text may be greatly reduced by introducing semantic categories in the grammar to capture the co-occurrence restrictions of the input string. In addition, ambiguity introduced by omission can be reduced by lexicalizing grammar rules to delimit the lexical items which 1This work was sponsored by the Defense Advanced Research Projects Agency. Opinions, interpretations, conclusions, and recommendations are those of the authors and are not necessarily endorsed by the United States Air Force. ~yrP iCally occur in phrases with omission in the given domain. A awback of this approach, however, is that the grammar coverage is quite low. On the other hand, grammar coverage may be maximized when we rely on syntactic rules defined in terms of part-of-speech at the cost of a high degree of ambiguity. Thus, the goal of maximizing the parsing coverage while minimizing the ambiguity may be achieved by adequately combining lexicalized rules with semantic categories, and non-lexicalized rules with syntactic categories. The question is how much semantic and syntactic information is necessary to achieve such a goal. In this paper we propose that an adequate amount of lexical information to reduce the ambiguity in general originates from verbs, which provide information on subcategorization, and prepositions, which are critical for PP-attachment ambiguity resolution. For the given domain, lexicalizing domain-specific expressions which typically occur in phrases with omission is adequate for ambiguity resolution. Our experimental results show that the mix of syntactic and semantic grammar as proposed here has advantages over either a syntactic grammar or a lexicalized semantic grammar. Compared with a syntactic grammar, the proposed grammar achieves a much lower degree of ambiguity without decreasing the grammar coverage. Compared with a lexicalized semantic grammar, the proposed grammar achieves a higher rate of parsing coverage without increasing the ambiguity. Furthermore, the generality introduced by the syntactic rules facilitates the porting of the system to other domains as well as enablin.g the system to handle unknown words efficiently. This paper is organized as follows. In section 2 we discuss the motivation for lexicalizing grammar rules with semantic categories in the context of translating telegraphic messages, and its drawbacks with respect to parsing coverage. In section 3 we propose a grammar writing technique which minimizes the ambiguity of the input and maximizes the parsing coverage. In section 4 we give our experimental results of the technique on the basis of two sets of unseen test data. In section 5 we discuss system engineering issues to accommodate the proposed technique, i.e., integration of part-of-speech tagger and the adaptation of the understanding system. Finally section 6 provides a summary of the paper. 2 Translat ion of Telegraphic Messages Telegraphic messages contain many instances of phrases with omission, cf. (Grishman, 1989), as in (1). This introduces a greater degree of syntactic ambiguities than for texts without any omitted element, thereby posing a new challenge to parsing. (1) TU-95 destroyed 220 nm. (~ An aircraft TU-95 was destroyed at 220 nautical miles) Syntactic ambiguity and the resultant misparse induced by such an omission often leads to a mistranslation in a machine translation system, such as the one described in (Weinstein et ai., 1996), which is depicted in Figure 1. The system depicted in Figure 1 has a language understanding module TINA, (Seneff, 1992), and a language generation module

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Lexical Ambiguity Resolution for Turkish in Direct Transfer Machine Translation Models

This paper presents a statistical lexical ambiguity resolution method in direct transfer machine translation models in which the target language is Turkish. Since direct transfer MT models do not have full syntactic information, most of the lexical ambiguity resolution methods are not very helpful. Our disambiguation model is based on statistical language models. We have investigated the perfor...

متن کامل

Engl i sh - to - Korean Text Translation of Telegraphic Messages in a Limited Domain

This paper describes our work-in-progress in automatic English-to-Korean text; translation. This work is an initial step toward the ultimate goal of text and speech translation for enhanced nmltilingual and multinational operations. For riffs puipose, we have adopted an interlintlua approach with natural language understmlding (TINA) and generation (GENESIS) modules at the core. We tackle the a...

متن کامل

Automatic English-to-Korean Text Translation of Telegraphic Messages in a Limited Domain

متن کامل

Translation Ambiguity Resolution Based On Text Corpora Of Source And Target Languages

We propose a new method to resolve ambiguity in translation and meaning interpretation using linguistic statistics extracted from dual corpora of sourcu aud target languages in addition to tim logical restrictions described on dictiomtry and grammar rules for ambiguity resolution. It provides reasonable criteria for determining a suitable equivalent translation or meaning by making tile depende...

متن کامل

Rule Based Machine Translation of Noun Phrases from Punjabi to English

The paper presents automatic translation of noun phrases from Punjabi to English using transfer approach. The system has analysis, translation and synthesis component. The steps involved are pre processing, tagging, ambiguity resolution, translation and synthesis of words in target language. The accuracy is calculated for each step and the overall accuracy of the system is calculated to be abou...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1997

Ambiguity Resolution for Machine Translation of Telegraphic Messages

نویسندگان

چکیده

منابع مشابه

Lexical Ambiguity Resolution for Turkish in Direct Transfer Machine Translation Models

Engl i sh - to - Korean Text Translation of Telegraphic Messages in a Limited Domain

Automatic English-to-Korean Text Translation of Telegraphic Messages in a Limited Domain

Translation Ambiguity Resolution Based On Text Corpora Of Source And Target Languages

Rule Based Machine Translation of Noun Phrases from Punjabi to English

عنوان ژورنال:

اشتراک گذاری